Hyperspectral images of grapevine leaves including healthy leaves and leaves with biotic and abiotic symptoms

A hyperspectral imaging database was collected on two hundred and five grape plant leaves. Leaves were measured with a hyperspectral camera in the visible/near infrared spectral range under controlled conditions. This dataset contains hyperspectral acquisition of grape leaves of seven different varieties. For each variety, acquisitions were performed on healthy leaves and leaves with foliar symptoms caused by different grapevine diseases showing clear symptoms of biotic or abiotic stress on other organs. For each leaf, chemical measurements such as chlorophyll and flavonol contents were also performed.


Methods
Samples and analyses.Leaves were collected during September 2020, in the south of France (GPS coordinates: 43.84208931745156, 1.8538190583140841).Infected leaves were chosen in order to represent at best the variability of the available symptoms in terms of severity and stage of infection.A similar proportion of the number of leaves of both red and white varieties was collected for this experiment.In total two hundred and four leaves were collected in the fields.All information about leaves and their respective symptoms is summarized in Tables 1-3.
Each leaf and each vine from which it was extracted were diagnosed by a phytopathology expert.Leaves were extracted from the front face, in the middle of the canopy to avoid the younger and older organs which can present a different physiological behaviour.Regarding healthy leaves, they were selected in the same regions and they were asserted absent of any symptom.However, some of the healthy sample can exhibit slight forms of mechanical or chemical wounds (due to protection, management operations) and some slight damage caused by insects.In order to guarantee that leaf physiological status were not affected by the time delay between collection and acquisition, leaves were carried in controlled temperature and hydric conditions.

Foliar content measurements.
For each leaf, foliar content measurements (see Figs. 1-3) were made before sampling.These measurements were carried out with a Dualex 311 scientific+ TM (Force-A, Orsay, France) to provide chlorophyll a + b content (µg/cm2), epidermal flavonols content (in % of relative absorbance) and the crop nitrogen status index (NBI).

Hyperspectral image acquisition.
Hyperspectral images were acquired on each individual leaf under controlled conditions in laboratory.Acquisitions of leaf images were performed with a hyperspectral camera (IQ, Specim, Finland).Imaging of grapevine leaves was carried out in the spectral range of 400-900 nm, with a spectral resolution of 7 nm.Illumination was provided by a halogen lamp (Arrilite 750 Plus ARRI, Munich, Germany) and constant angles of −50° and 50° were maintained between the axes of the halogen lamp and the axis of the hyperspectral camera.
For each sample image, the intensity of the reflected light I(λ) was measured.The dark current I d (λ) i.e. signal without light, was recorded for each acquisition and then subtracted.The intensity I 0 (λ) of the light reflected by a certified standard reference (Labsphere, SRS-40-010) was measured to standardise spectra and to prevent from non-linearities of all the instrumentation components (light source, lens, fibers and spectrometer).From these measurements, a reflectance image R(λ) was calculated for each sample, as follows:   reference measurements.The second table file (description_variables.csv) contains information about variables used in the first table file.The folder 'Data/' contains 204 folders corresponding to 204 hyperspectral image acquisitions.In this data directory (see Fig. 4), each folder is named with the acquisition date followed by the acquisition number (YYYY-MM-DD_NBR).Reflectance image files are located in the 'results' directories and the acquisition date and acquisition number are specified in the file name (REFLECTANCE_YYYY-MM-DD_NBR.dat).These reflectance images are stored in ENVI format containing binary data (.dat) and header file (.hdr).Each reflectance image (.dat) is around 214 MB.The first and the second dimensions correspond to a spatial position (pixels) forming the image composed of 512 × 512 pixels (see Fig. 5).The third dimension refers to spectral variables with two hundred and four spectral bands.For each image acquisition, raw image, white reference and dark measurements are available in the directory named 'capture' .All these raw data are also stored in ENVI format.
For each hyperspectral acquisition, a metadata file is produced containing information about the acquisition.In this metadata file, an identification key called global_tag allows the image to be linked to factors in the experimentation or to reference values.

technical Validation
We analysed part of this dataset in a first publication to classify images of diseased (flavescence dorée) and healthy leaves 9 .In this study, we proposed a methodology based on multivariate curve resolution-alternating least squares (MCR-ALS) and factorial discriminant analysis (FDA).In this publication we tried to classify each leaf pixel for each image.For the total pixels to be classified (both infected and healthy pixels), the classification rate achieved 85.1%.Another classification result at the leaf (image) level was also investigated.The classification of an image was obtained by counting the majority class among the image pixels.Out of the thirty seven test images, only two images were misclassified with this method.
The aforementioned publication was based on a subset of the whole dataset.Indeed, the exploitation of this quantity of spectral images to differentiate as many symptoms is a real challenge.Due to the complexity of this dataset and the difficulty of providing masks for each symptom, for initial exploration the average spectrum of the database (see Fig. 6) is calculated, as well as the average spectra per variety (see Fig. 7) and per symptom (see Fig. 8).Then an exploration of the average spectra per leaf is performed by a Principal Component Analysis (PCA).
Mean spectrum and standard deviation of the entire database.The mean spectrum and standard deviation are calculated from all spectra of the two hundred and four measured leaves (see Fig. 6).The average spectrum is typical of a vegetation spectrum.Low values between 400 nm and 500 nm are mainly related to carotenoid and chlorophyll (a + b) contents.The characteristic large peak around 550 nm is attributed to the anthocyanin content.The spectral region between 620 nm and 680 nm is related to the chlorophyll content of the leaves.The red edge between 680 and 750 nm is also typical of vegetation, separates the visible spectral region related to pigments and the plateau between 750 and 1000 nm related to the leaf structure.
Average spectra per variety and per symptom.From this database, average spectra are calculated per variety (see Fig. 7) and per symptom (see Fig. 8).Out of the seven average spectra per variety (see Fig. 7), two spectrum shapes are identified in the spectral region between 500 nm to 700 nm.The three spectra corresponding to 'Duras' , 'Fer' and 'Gamay' varieties have lower values around 550 nm while the spectra corresponding to 'Chardonnay' , 'Colombard' , 'Loin de l' oeil' and 'Mauzac' have higher values.This difference seems to be related to the anthocyanin content in leaves depending on whether the variety is red or white.
Figure 8 displays the average spectrum for each of the twelve symptoms.The spectrum corresponding to the healthy leaf modality shows the same similarities of a typical vegetation spectrum as described above (see Fig. 6).Although for each symptom the spectra are averaged across all grape varieties, differences are noticeable.For example, 'deficiency' and 'chlorosis' symptoms differ from other symptoms with higher values from 500 nm to 650 nm.Two other symptoms ('buffalo treehopper' and 'water stress') also differ in the same spectral range but with lower values.The differences between the average spectra between 600 nm and 700 nm, as well as the dynamics of the red-edge or the shape of the plateau would require further processing.

Principal component analysis.
For each image, an average spectrum was calculated from the leaf pixels.
Then a principal component analysis was performed on these two hundred and four average spectra.Figure 9 shows PCA scores obtained for the two first components.A few combinations (variety, symptoms) show a particular behaviour on this score plot.For example, scores of 'Chardonnay' combined with 'flavescence dorée' have positive scores on both axes and are opposite to the negative scores of healthy 'Chardonnay' modality.For other varieties and symptoms, scores are more evenly distributed along the two axes.This is can be explained by the preponderance of the 'flavescence dorée' and 'heatlhy' observations.Another notable observation is the clear distinction of some observations from the rest of the group, such as 'senescence' combined with the 'Duras' and 'Mauzac' variety on PC1 and 'senescence' combined with 'Loin de l' oeil' variety on PC2.These results should be considered in relation to the loadings of the principal components concerned (see Figs. 10,11).The first component corresponds to an inverted overall shape of the spectra (see Fig. 6) which could correspond to the total amount of signal received by the camera.The second component shows loadings of positive values between 400 nm and 600 nm with a strong positive value in the 550 nm region which is related to anthocyanin content.This technical validation was only carried out on the first two principal components of a PCA.The availability of this dataset would allow further study through other principal components or even more generally using other methods.This dataset offers great perspectives for further study, such as classification capabilities according to confounding factors, assessment of spectral variability of symptoms according to variety or improvement of the labelling process by selecting only symptomatic areas of the leaf.

Usage Notes
There are many advantages to this database.Firstly, it provides hyperspectral images covering a multitude of grapevine symptoms, including different grape varieties.One of the benefits of this dataset lies in the possibility of developing new analysis methods.On a more practical level, it will be used to study the potential of hyperspectral imaging to detect the symptoms proposed and to identify confounding factors.In addition, measurements are carried out under controlled conditions, guaranteeing the reliability and accuracy of the data collected.In particular, measurements are carried out on whole leaves, which generates an abundance of pixels available.
This dataset has certain limitations.Firstly, there may be an imbalance between the number of images available for each grapevine symptom, which could potentially bias the results.Another limitation is that the small number of images available can be limiting for deep learning approaches.

Fig. 5
Fig. 5 An RGB image reconstructed from a hyperspectral image.

Fig. 6
Fig.6 Mean spectrum and standard deviation of all leaf pixels.

Fig. 9
Fig.9Scores obtained on the two first components.

Fig. 11
Fig. 11 Loadings of the second component.

Fig. 10
Fig. 10 Loadings of the first component.

Table 1 .
Number of healthy and infected leaves per variety.

Table 2 .
Number of images per symptom.
The dataset, mentioned scripts and algorithms are available in the INRAE data repository15.This dataset contains two table files (description.csvand description_variables.csv), a folder called Data/ and another folder called Code/.The first table file (description.csv)contains the experiment factors and the

Table 3 .
Number of images per symptom per variety.